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(54) System for and method of storing data 

(57) An adaptive digital tree data structure incorpo- 
rates a rich pointer object (1 04, 1 05 , 1 1 0-1 1 2 , 11 4-11 8) , 
the rich pointer including both conventional address re- 
direction information (1 16B) used to traverse the struc- 
ture and supplementary information (11 6A) used to op- 



timize tree traversal, skip levels, detect errors, and store 
state information. The structure of the pointer is flexible 
so that, instead of storing pointer information, data may 
be stored in the structure of the pointer itself and thereby 
referenced without requiring further redirection. 
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Description 

[0001] The present invention relates generally to the 
field of data structures, preferably to a hierarchical data 
organization in which the structure of the data organiza- 
tion is dependent on the data stored and information is 
associated with pointers. 

[0002] The present invention is related to co-pending, 
European Patent Application No.(RJ/N12198), (RJ/ 
N12199), and (RJ/N12336), filed the same day as this 
application. 

[0003] Computer processors and associated memory 
components continue to increase in speed. As hardware 
approaches physical speed limitations, however, other 
methods for generating appreciable decreases in data 
access times are required. Even when such limitations 
are not a factor, maximizing software efficiency maxi- 
mizes the efficiency of the hardware platform, extending 
the capabilities of the hardware/software system as a 
whole. One method of increasing system efficiency is 
by providing effective data management, achieved by 
the appropriate choice of data structure and related stor- 
age and retrieval algorithms. For example, various prior 
art data structures and related storage and retrieval al- 
gorithms have been developed for data management in- 
cluding arrays, hashing, binary trees, AVL trees (height- 
balanced binary trees), b-trees, and skiplists. In each of 
these prior art data structures, and related storage and 
retrieval algorithms, an inherent trade-off has existed 
between providing faster access times and providing 
lower memory overhead. For example, an array allows 
for fast indexing through the calculation of the address 
of a single array element but requires the pre-allocation 
of the entire array in memory before a single value is 
stored, and unused intervals of the array waste memory 
resources. Alternatively, binary trees, AVL trees, b-trees 
and skiplists do not require the pre-allocation of memory 
forthe data structure and attempt to minimize allocation 
of unused memory but exhibit an access time which in- 
creases as the population increases. 
[0004] An array is a prior art data structure that has a 
simplified structure and allows for rapid access of the 
stored data. However, memory must be allocated forthe 
entire array and the structure is inflexible. An array value 
Is looked up "posttionally/' or "digitally," by multiplying 
the Index by the size (e.g., number of bytes) allocated 
to each element of the array and adding the offset of the 
base address of the array. Typically, a single Central 
Processing Unit (CPU) cache line fill is required to ac- 
cess the array element and value stored therein. As de- 
scribed and typically implemented, the array is memory 
inefficient and relatively inflexible. Access, however, is 
provided as 0(1), /.©., independent of the size of the ar- 
ray (ignoring disk swapping). 

[0005] Alternatively, other data structures previously 
mentioned including binary trees, b-trees t skiplists and 
hash tables, are available which are more memory effi- 
cient but include undesirable features. For example, 



hashing is used to convert sparse, possibly multi-word 
indexes (such as strings) into array indexes. The typical 
hash table is a fixed-size array, and each index into it is 
the result of a hashing algorithm performed on the orig- 

5 inal index. However, in order for hashing to be efficient, 
the hash algorithm must be matched to the indexes 
which are to be stored. Hash tables also require every 
data node to contain a copy of (or a pointer to) the orig- 
inal index (key) so you can distinguish nodes in each 

10 synonym chain (or other type of list). Like an array, use 
of hashing requires some preallocation of memory, but 
it is normally a fraction of the memory that must be al- 
located for a flat array, if well designed i.e., the charac- 
teristics of the data to be stored are well known, behaved 

'5 and matched to the hashing algorithm, collision resolu- 
tion technique and storage structure Implemented. 
[0006] In particular, dlgitattrees, ortries, provide rapid 
access to data, but are generally memory Inefficient. 
Memory efficiency may be enhanced for handling 

20 sparse index sets by keeping tree branches narrow, re- 
sulting in a deeper tree and an increase in the average 
number of memory references, indirections, and cache 
line fills, all resulting in slower access to data. This latter 
factor, i.e., maximizing cache efficiency, is often ignored 

25 when such structures are discussed yet may be a dom- 
inant factor affecting system performance. A trie is a tree 
of smaller arrays, or branches, where each branch de- 
codes one or more bits of the index. Prior art digital trees 
have branch nodes that are arrays of simple pointers or 

30 addresses. Typically, the size of the pointers or address- 
es are minimized to Improve the memory efficiency of 
the digital tree. 

[0007] At the "bottom 1 * of the digital tree, the last 
branch decodes the last bits of the index, and the ele- 

35 ment points to some storage specific to the index. The 
"leaves" of the tree are these memory chunks for spe- 
cific indexes, which have application -specific structures. 
[0008] Digital trees have many advantages Including 
not requiring memory to be allocated to branches which 

40 have no indexes or zero population (also called an emp- 
ty subexpanse). In this case the pointer which points to 
the empty subexpanse is given a unique value and is 
called a null pointer indicating that it does not represent 
a valid address value. Additionally, the indexes which 

^5 are stored in a digital tree are accessible In sorted order 
which allows Identification of neighbors. An "expanse" 
of a digital tree as used herein is the range of values 
which could be stored within the digital tree, while the 
population of the digital tree is the set of values that are 

so actually stored within the tree. Similarly, the expanse of 
a branch of a digital tree is the range of indexes which 
could be stored within the branch, and the population of 
a branch is the number of values (e.g., count) which are 
actually stored within the branch. (As used herein, the 

55 term "population" refers to either the set of indexes or 
the count of those indexes, the meaning of the term be- 
ing apparent to those skilled in the art from the context 
in which the term is used.) 
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[0009] "Adaptive Algorithms for Cache- Efficient Trie 
Search" by Acharya, Zhu and Shen (1999), the disclo- 
sure of which is hereby incorporated herein by refer- 
ence, describes cache-efficient algorithms for trie 
search. Each of the algorithms use different data struc- 
tures, including a partitloned-array, B-tree, hashtable, 
and vectors, to represent different nodes in a trie. The 
data structure selected depends on cache characteris- 
tics as well as the fanout of the node. The algorithms 
further adapt to changes in the fanout at a node by dy- 
namically switching the data structure used to represent 
the node. Finally, the size and the layout of individual 
data structures is determined based on the size of the 
symbols In the alphabet as well as characteristics of the 
cache(s). The publication further includes an evaluation 
of the performance of the algorithms on real and simu- 
lated memory hierarchies. 

[0010] Other publications known and available to 
those skilled in the art describing data structures include 
Fundamentals of Data Structures in Pascal, 4th Edition; 
Horowitz and Sahni; pp 582-594; The Art of Computer 
Programming,Vo\\jme 3; Knuth; pp 490-492; Algorithms 
in C; Sedgewick; pp 245-256, 265-271 ; "Fast Algorithms 
for Sorting and Searching Strings"; Bentley, Sedgewick; 
"Ternary Search Trees"; 5871926, INSPEC Abstract 
Number: C9805-61 20-003; Or Dobb's Journal; "Algo- 
rithms for Trie Compaction", ACM Transactions on Da- 
tabase Systems, 9(2):243-63, 1984; "Routing on long- 
est-matching prefixes"; 5217324, INSPEC Abstract 
Number B9605-6150M-005, C9605-5640-006; "Some 
results on tries with adaptive branching"; 6845525, IN- 
SPEC Abstract Number: C2001 -03-61 20-024; "Fixed- 
bucket binary storage trees"; 01998027, INSPEC Ab- 
stract Number: C83009879; "DISCS and other related 
data structures"; 03730613, INSPEC Abstract Number: 
C90064501 ; and "Dynamical sources in information the- 
ory: a general analysis of trie structures"; 6841374, IN- 
SPEC Abstract Number: B2001 -03-6110-014, 
C2001 -03-6120-023, the disclosures of which are here^ 
by Incorporated herein by reference. 
[0011] An enhanced storage structure is described in 
U.S. Patent Application Serial No. 09/457,164 filed De- 
cember 8, 1999, entitled "A FAST EFFICIENT ADAP- 
TIVE, HYBRID TREE," (the '164 application) assigned 
In common with the Instant application and hereby in- 
corporated herein by reference in Its entirety. The data 
structure and storage methods described therein pro- 
vide a self-adapting structure which self-tunes and con- 
figures "expanse" based storage nodes to minimize 
storage requirements and provide efficient, scalable da- 
ta storage, search and retrieval capabilities. The struc- 
ture described therein, however, does not take full ad- 
vantage of certain sparse data situations. 
[0012] An enhancement to the storage structure de- 
scribed in the '164 application is detailed in U.S. Patent 
Application Serial No. 09/725,373, filed November 29, 
2000, entitled "A DATA STRUCTURE AND STORAGE 
AND RETRIEVAL METHOD SUPPORTING ORDINAL- 



ITY BASED SEARCHING AND DATA RETRIEVAL", as- 
signed in common with the instant application and here- 
by incorporated herein by reference in its entirety. This 
latter application describes a data structure and related 
s data storage and retrieval method which rapidly pro- 
vides a count of elements stored or referenced by a hi- 
erarchical structure of ordered elements (e.g., a tree), 
access to elements based on their ordinal value in the 
structure, and identification of the ordinality of elements. 
10 in an ordered tree Implementation of the structure, a 
count of indexes present in each subtree is stored, i.e., 
the cardinality of each subtree is stored either at or as- 
sociated with a higher level node pointing to that subtree 
or at or associated with the head node of the subtree. 
is in addition to data structure specific requirements (e.g., 
creation of a new node, reassignment of pointers, bal- 
ancing, etc.) data insertion and deletion Includes steps 
of updating affected counts. Again, however, the struc- 
ture fails to take full advantage of certain sparse data 

20 situations. 

[0013] The present invention seeks to provide im- 
proved data processing, in the preferred embodiments 
to optimize performance characteristics of a digital tree 
and similar structures. 

25 [001 4] According to an aspect of the present Invention 
there is provided a data structure as specified in claim 1 . 
[001 5] According to another aspect of the present in- 
vention there is provided a method of storing indexes in 
a data structure as specified in claim 6. 

30 [0016] According to another aspect of the present in- 
vention there is provided a computer memory as spec- 
ified in claim 10. 

[0017] The preferred system includes a data structure 
which is stored in the memory, can be treated as a dy- 

35 namic array, and Is accessed through a root pointer. For 
an empty tree, this root pointer is null, otherwise It points 
to the first of a hierarchy of branch nodes. Each branch 
node consists of a plurality of informational or "rich- 
pointers which subdivide the expanse of the index (key) 

*o used to access the data structure. Each rich pointer con- 
tains auxiliary information in addition to, or In some cas- 
es instead of, the address of (that is, the pointer to) a 
subsidiary (child) branch or leaf node. This auxiliary in- 
formation permits various optimizations that result in a 

<5 positive "return on Investment" despite the space re- 
quired to store the Information. 
[0018] An Informational pointer may contain an ad- 
dress (the actual pointer to a child branch or leaf node); 
index digits (parts of keys) that help skip levels in the 

so tree or bring leaf information to the present level; popu- 
lation counts that help rapidly count the numbers of valid 
(stored) indexes in the tree or in any subexpanse (range 
of indexes); and type information about the next level in 
the tree, if any, to which the pointer points. Pointers may 

55 also provide information for verifying operation and data 
integrity, and correcting errors. State information may 
also be bundled with pointers so that the resultant rich 
pointers provide state information. In this case, the data 
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structure not only provides a means to store and manip- 
ulate data, but includes facilities supporting the process- 
es using the structure. The inclusion of this information 
allows the digital tree to be compressed in various ways 
that make it smaller, more cache-efficient, and faster to 
access and modify, even as the branch nodes are po- 
tentially no longer simply arrays of pointers to subsidiary 
nodes. This information also provides structure and re- 
dundancies that allow for faster access to and modifica- 
tion of the tree, as well as detection of data corruption. 
[0019] Embodiments of the present invention are de- 
scribed below, by way of example only, with reference 
to the accompanying drawings, in which: 

FIGURES 1A - 1E depict a digital tree which in- 
cludes a comparison between prior art pointers and 
an informational pointer for skipping levels in a data 
structure; 

FIGURE 2 is generalized diagram of an information- 
al pointer incorporating immediate storage of index- 
es; 

FIGURE 3 is a chart showing typical storage capa- 
bilities of informational pointers used to store imme- 
diate indexes; 

FIGURES 4A-4D are diagrams of rich pointers used 
to store 3, 2 and 1 byte immediate indexes on a 
32-bit system; 

FIGURES 5A-5H are diagrams of rich pointers used 
to store 7-1 byte immediate indexes on a 64-bit sys- 
tem; 

FIGURES 6A - 6D are diagrams of rich pointers 
used to store immediate indexes and associated 
values on a 64-bit system; 
FIGURES 7A - 7E depict a digital tree which in- 
cludes a comparison between indexes stored in leaf 
nodes and informational pointers used as immedi- 
ate indexes; and 

FIGURE 8 is a block diagram of a computer system 
in which the data structure may be implemented. 

[0020] As previously described, typical digital trees 
exhibit several disadvantages. These disadvantages in- 
clude memory allocated to null pointers associated with 
empty branches while exhibiting an increased number 
of memory references or Indirections, and possibly 
cache line fills, as the size {i.e., "fanouf) of the branches 
narrows to reduce the number of these null pointers. 
These disadvantages associated with digital trees have 
limited their use in prior computer applications. 
[0021] The described embodiment combines the ad- 
vantages of the digital tree with smarter approaches to 
handling both non-terminal nodes (branches) and ter- 
minal nodes (leaves) In the tree. These smarter ap- 
proaches minimize both memory space and processing 
time, for both lookups, insertions and modifications of 
data stored in the data structure. Additionally, It ensures 
the data structure remains efficient as indexes are add- 
ed or deleted from the data structure. The approaches 



used by this embodiment include forms of data com- 
pression and compaction and help reduce the memory 
required for the data structure, minimize the number of 
cache line fills required, and reduce access and retrieval 

5 times. 

[0022] The described system replaces the simple 
pointers typically implemented in digital trees with "rich" 
pointers (herein termed "informational pointers" and 
used interchangeably therewith) which associate addi- 

10 tional information with the redirection or address Infor- 
mation of the pointers. This additional information may 
be used by the data structure and/or by processes ac- 
cessing the structure. The use of rich pointers within the 
digital tree permits various optimizations within the data 

15 structure. In a preferred embodiment of the invention 
each rich pointer in a digital tree branch includes multi- 
ple segments or portions, typically occupying two words 
(dependent upon the target platform). The rich pointer 
may contain an address (the actual pointer), index digits 

20 (parts of keys), population counts, type information con- 
cerning the next level to which the pointer "points" or is 
directed to within the tree, redundant data supporting 
error detection, state information, etc. 
[0023] One type of a rich pointer is a narrow-expanse 

25 pointer, In particular, one type of data compression that 
may be used when an expanse is populated by a "dense 
cluster" of indexes that all have some leading bits in 
common is supported by a narrow-expanse pointer, to 
the present invention. The typical representation of the 

30 common bits through multiple digital tree branches (or 
redundant bits in leaves) can be replaced by uniquely 
representing (e.g., encoding) the common bits as part 
of or associated with the pointer to the branch or leaf. 
In a preferred embodiment of the present invention, this 

35 type of data compression is limited to common leading 
whole bytes. The common bits are stored in a rich point- 
er and the pointer type indicates the level of and the 
number of remaining undecoded digits in the next ob- 
ject. The remaining undecoded digits imply the number 

40 of levels skipped by the narrow pointer. The rich pointer 
is stored (i.e., associated) with the pointer to the next 
level, which has an expanse smaller than it would oth- 
erwise. Preferably, each subexpanse pointer contains a 
"decode" field that holds all Index bytes decoded so far 

45 except for the first byte. Narrow pointers provide a meth- 
od to skip levels within a digital tree which save memory 
used to store the indexes and reduces the number of 
memory references and cache fills required. 
[0024] FIGURES 1 A - 1 E depict the use of narrow- 

so expanse pointers In a digital tree. (For the purposes of 
the present illustration, examples of the data structure 
are given with reference to a 32-bit word size platform 
wherein indexes are single words (as opposed, e.g., to 
character strings of arbitrary length) although it is un- 

55 derstood that the system is not so limited and, to the 
contrary, encompasses other word sizes and configura- 
tions including, but not limited to 16, 32, 64 and 128-bit 
word sizes.) As used herein, the term "slot" refers to a 
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record or gro up of cells of an array associated with , and/ 
or including a pointer to a child node or, more generally, 
a subexpanse of indexes, together with any data asso- 
ciated with the pointer. Generally, the array is "indexed" 
so that each cell or "slot" is associated with an offset 
value corresponding to an ordinal value of the slot within 
the array. Thus, in further detail, root pointer node 101 
is used for accessing the underlying data structure of 
the digital tree. Root pointer node 1 01 includes address 
information 102 diagrammatically shown as an arrow 
pointing to a first or lop" level node 103, in this illustra- 
tion, a branch node. (Note, the terminology used herein 
labels the top node of a tree pointed to by the root as 
"level 1", children of the level 1 node are designated as 
"level -2" nodes, etc. According to this convention, the 
level of any branch or leaf node is equal to one more 
than the number of digits (bytes) decoded in the Indexes 
stored above that node. It is further noted that this con- 
vention, while representative, is for purposes of the 
present explanation and other conventions may be 
adopted including, for example, designating leaf nodes 
as constituting a first level of the tree. In this latter case, 
a preferred embodiment of the invention, the level of any 
branch or leaf node is equal to the number of digits 
(bytes) remaining to decode in the indexes stored at or 
below that node.) First level node 103 includes slots or 
enhanced pointer arrays for up to 256 lower level nodes 
and represents the entire expanse of the data structure, 
i.e. indexes 00000000 through FFFFFFFF hex by im- 
plementing a 256-way branch. (Note that, although a 
preferred embodiment decodes 1 byte of the index at 
each branch, other divisions of the index may be used 
including, for example, decoding 4 bits to implement a 
16-way branch at each level of the tree, etc.) First level 
node 1 03 includes first slot 1 04 (containing an adaptable 
object) which corresponds to expanse 
00000000-OOFFFFFF and last slot 105 which corre- 
sponds to a final expanse portion including indexes 
FF000000-FFFFFFFF. The pointer contained in the 
pointer field in slot 104 points to a first one of 256 of the 
next level subexpanses (level 2 in the digital tree) while 
the pointer in slot 1 05 points to the most significant up- 
per 1 /256th of level 2. 

[0025] The first subexpanses of level 2 includes sub- 
sidiary node 1 08 in turn Including ah array of 256 point- 
ers directed to lower level nodes 119 and 120, As 
shown, the expanse covered by node 1 08 (i.e., an index 
range of 00000000-OOFFFFFF hex) is only sparsely 
populated by indexes falling within the subexpanse 
ranges covered by third level nodes 119 and 120 (/.e., 
00000000-OOOOFFFF and 001 00000-001 0FFFF hex, 
respectively). Thus, while the pointers in slots 110 and 
112 Include valid redirection information to (/.a, address 
of) nodes 119 and 120, the remaining 254 pointers of 
node 108, including the pointer in slot 111 covering an 
uppermost expanse range of O0FF00O0-0OFFFFFF 
hex, are null pointers, i.e., have a special value reserved 
for pointers that are not directed to any target location 



or to empty nodes. Note that node 120 is similarly 
sparsely populated, with all indexes falling within a sin- 
gle subexpanse node 121 associated with a range of 
001 00200-001 002FF hex and pointed to by the sole ac- 
5 tive pointer in node 120, that is pointer 122. Thus, not 
only does node 120 require the allocation of additional 
storage space for 256 pointers, but access to indexes 
referenced by it to leaf nodes requires two indirections 
and therefore two cache fills. 
w [0026] Thus, as pictured, slot 110 contains a pointer 
to a level 3 slot which corresponds to 
000000-OOOOFFFF. Additionally, slot 112 contains a 
pointer which points to a separate subexpanse 120 of 
level 3 which correlates to 001 00000-001 OFF FF. Simi- 
15 larly. slots within level 3 may further point to a subex- 
panse at level 4. Operationally, level 4 of FIGURES 1A 
- 1 E Is reached by consecutive decoding of one-byte 
portions of the Index and traversing the tree In accord- 
ance with the decoded values. The first one byte (00) is 
used to identify slot 1 04 which contains the correspond- 
ing pointer to traverse the tree from level 1 to the corre- 
sponding portion of level 2 i.e., the node addressed by 
the pointer of slot 1 04. The next byte (1 0) is used to Iden- 
tify slot 112 which contains the corresponding pointer to 
traverse the tree from node 108 to subsidiary node 120 
at level 3. The next byte (02) is used to identify slot 122 
which contains the corresponding pointer to traverse the 
tree from node 120 of level 3 to node 121 of level 4. 
Once at level 4, the remaining byte is used to access 
the appropriate slot of node 121 to retrieve the data as- 
sociated with the index value. As described, this process 
requires four separate memory references and poten- 
tially four different cache fills to identify the correct mem- 
ory address which corresponds to the index. 
[0027] If an expanse, orsubexpanse, is sparsely pop- 
ulated with a small number of dense clusters of subsid- 
iary indexes, a rich pointer may be used to encode the 
common bits of the populated subexpanse or indexes. 
Still referring to FIGURES 1 A - 1 E, the upper 1/256 sub- 
expanses of level 2 subsidiary node 109 contains a 
dense cluster of indexes which each lie within the range 
of FF1 00200-FF1 002 FF. The other portions of the upper 
1/256 subexpanse, FF100000-FF1001FF and 
FF100300-FF10FFFF do not contain indexes. In this 
case, a rich pointer can be used to point directly to the 
level 4 portion of the subexpanse, skipping level 3 and 
eliminating the need for a memory reference or indirec- 
tion to level 3. Specifically, the corresponding slot 116 
contains a rich pointer node which includes an informa- 
tion data field 1 1 6A and a pointer node 1 1 6B to the next 
subexpanse or other structure for accessing the subsid- 
iary indexes. The information data field 116A includes 
the common bytes {i.e., index portion) of the remaining 
indexes, 02, because the remaining indexes all fall with- 
in the range of FF100200-FF1002FF. 
[0028] In this case, the rich pointer is used to eliminate 
one of the memory references and possibly one cache 
fill. The first two bytes (FF) of the index are used to 
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traverse from the level 1 of the tree to the appropriate 
portion of level 2. Once at the level 2 node the rich point- 
er is used to traverse from the level 2 node directty to 
the level 4 node. 

[0029] The rich pointer structure encompasses at 
least two types of rich pointers or adaptable objects in- 
cluding a pointer type as described above and an im- 
mediate type. The immediate type supports immediate 
leaves or immediate indexes. That is, when the popula- 
tion of an expanse is relatively sparse, a rich pointer is 
used to store the indexes "immediately" within a digital 
tree branch, rather requiring traversal of the digital tree 
down to the lowest level to access the index. This format 
is akin to the "immediate" machine instruction format 
wherein an instruction specifies an immediate operand 
which immediately follows any displacement bytes. 
Thus, an immediate index or a small number of indexes 
are stored In the node, avoiding one or more redirections 
otherwise required to traverse the tree and arrive at 
some distant leaf node. Immediate indexes thereby pro- 
vide a way of packing small populations (or small 
number of indexes) directly into a rich pointer structure 
instead of allocating more memory and requiring multi- 
ple memory references and possible cache fills to ac- 
cess the data. 

[0030] A two word format of the preferred embodi- 
ment readily supports the inclusion of immediate index- 
es. Within the rich pointer, this is accomplished by stor- 
ing index digits in the information data field. A rich point- 
er implemented in a 32-bit system may store anywhere 
from a single 3-byte Immediate index up to seven 1 -byte 
indexes, while a rich pointer in a 64-bit system may store 
up to 15 1-byte immediate indexes} The generalized 
structure of a rich pointer (also referred to as an adapt- 
able object) supporting immediate indexes is shown in 
FIGURE 2. The rich pointer includes one or more index- 
es B l", depending on the word-size of the platform and 
the size of the index, and an 8-bit Type field that also 
encodes the index size and the number of immediate 
indexes. 

[0031 ] As mentioned, the number of immediate index- 
es stored will depend upon the word-size of the indexes, 
upper levets within the tree nearest the root requiring 
larger indexes, smaller indexes being found as the tree 
is traversed toward the leaves. Examples of numbers of 
Immediate index values of various sizes accommodated 
by 32-bit and 64-blt machines according to a preferred 
embodiment are presented in FIGURE 3 wherein index- 
es are mapped to valid/invalid indicators and have no 
associated values. FIGURES 4A-4D illustrate 3, 2 and 
1 -byte index sizes stored in an immediate rich pointer 
structure implemented on a 32-bit platform, while FIG- 
URES 5A-5H illustrate index sizes of 7 through 1 byte 
implemented on a 64-bit machine. The structures of 
FIGURES 4A-4D and 5A-5H are also directed to an em- 
bodiment of the invention in which only the presence or 
absence of an index is indicated without any other value 
being associated with the indexes. 



[0032] FIGURES 6A - 6D illustrate another embodi- 
ment of the invention on a 64-bit machine wherein a val- 
ue is associated with each index l n . According to this 
embodiment, when a single immediate index I, of up to 

s 7 bytes is stored in a rich pointer structure, a 64-bit value 
associated with the index is also stored as shown in FIG- 
URE 6A. However, if more than one Immediate index is 
to be stored, such as when an index may be represented 
by 3-bytes, 2-bytes or 1-byte indexes (FIGURES 6B - 

to 6C, respectively), then the first 8-byte word of the rich 
pointer is Instead used as a pointer to values associated 
with the respective multiple indexes. A similar configu- 
ration is used to store values associated with indexes 
when the invention is implemented on a 32-bit machine. 

is [0033] Immediate indexes are packed into rich point- 
ers starting at the 'first byte" (farthest from the type 
field), and possibly leaving some unused storage. An 
exception Is present In a preferred embodiment where- 
in, if a single immediate index is stored, the indexes be- 

20 gin at the first byte of the second word to allow the first 
word to be a value area corresponding to the index, for 
those arrays that map an index to a value (see FIG- 
URES 4A, 5A and 6A). The structure of an ordinary leaf 
and the indexes portion of a rich pointer containing an 

25 immediate index are identical once the starting address, 
index size : and population are known. 
[0034] Thus, as described, an immediate index rich 
pointer structure may be thought of as including a small 
leaf. Such a structure is particularly helpful to represent 

30 a sparsely populated expanse where the indexes reside 
In the rich pointer itself. 

[0035] FIGURES 7A - 7E illustrate a comparison be- 
tween typical pointers and a rich pointer which can be 
used to store immediate indexes. The indexes would 

35 typically be stored in the portion of level 4 node 121 in 
the corresponding array cell or "slot." By using a rich 
pointer as an immediate index, the indexes that would 
otherwise reside in a leaf node such as leaf node 701 
are instead stored in the corresponding portion of a high- 

to er level node, e.g., level 2 node 109. For a 64 bit system, 
one or more indexes can be stored in slot 116 of level 2 
node 109 in immediate index data field 702. As dia- 
gramed, slot 116 is logically divided into multiple sub- 
slots, each storing an Immediate index. The use of rich 

<5 pointers as immediate indexes avoids at least one mem- 
ory reference and one or more cache line fills. 
[0036] Another use of informational fields available 
with rich pointers is directed to storing state information 
associated with the object referenced by the pointer or 

so otherwise describing and/or storing state information 
such as the state of the procedure accessing the struc- 
ture. Thus, while the tree Itself is not a state machine, 
when combined with a specified index to insert, delete, 
or retrieve, it may be used as input to the accessing 

S3 process that allows the code to operate similar to a state 
machine Each tree subexpanse pointer includes a 
"type" field {e.g., 8-blts) that encodes one of a large 
number {e.g.. 256) of enumerated object types. Levels 
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in the tree are directly encoded in the enumerations. 
Rich pointers allow rich pointer types which are a large 
number ot very specific next-level object types. The tree 
traversal code can be thought of as a state machine 
whose inputs are the index/key to decode and the nodes 
of the tree. In a preferred embodiment, the state ma- 
chine algorithm provides software which appears to be 
a single large switch, allowing the software to act as a 
collection of small, fast code chunks, each pre-opti- 
mized to perform one task well with minimum run-time 
computations. Since the tree level is encoded in each 
pointer type, the traversal code does not need to track 
the current tree level. Instead, this state information is 
stored within the nodes of the tree itself. 
[0037] Rich pointers may also be used to detect errors 
in data, interpret and process data, etc., by providing 
information redundancy. That is, data characterizing the 
information stored as part of a rich pointer may be used 
to detect errors in the referenced data much as an error 
detection code might be used. This information may also 
be used to confirm, for example, position in a tree by 
encoding level or similar information in association with 
each pointer to form a rich pointer. 
[0036] In particular, in practice it is not feasible or de- 
sirable to compress all unused bits out of a rich pointer. 
Machine instruction efficiency is partially dependent on 
word and byte boundaries. While the ratio of cache fill 
time to CPU instruction time is sufficiently high that 
cache efficiency is important, it is generally still low com- 
pared to other data compression methods that, for ex- 
ample, are directed to minimizing disk reads. (Cache- 
efficient programs must balance CPU time against 
"complete" data compression.) The result of "incom- 
plete" compression is to provide and use some redun- 
dant data in rich pointers that allows tree traversal code 
to opportunistically, but very "cheaply," detect and report 
many types of data corruption in the tree itsetl, resulting 
either from tree management code defects or external 
accidents. In the preferred embodiment, cheaply detect- 
ed corruptions may result in a void pointer return, while 
"expensive 0 detections result in assertion failure in de- 
bug code only and are ignored in production code. The 
unused bits of a rich pointer may be used to opportun- 
istically determine various types of data corruption. As 
used herein, error detection data refers to any redun- 
dant data available to detect data corruption, whether 
or not that data is stored solely for the purpose of error 
detection or stored for other (functional) purposes but 
having a secondary use to detect errors. 
[0039] For example, an error condition may be iden- 
tified by checking to see that a pointer type matches the 
tree level, for example, it is inappropriate for certain ob- 
jects 6uch as a "Leafl" object to appear at other than the 
lowest level of the tree furthest from the root. With ref- 
erence to FIGURES 7A - 7E, if the type field 703 con- 
tains an invalid value, such as 255, an invalid rich pointer 
type would be indicated and appropriate error process- 
ing performed. 



[0040] Another check is performed for decode bytes 
in subexpanse pointers which include already-decoded 
index bytes that are not required as part of a narrow 
pointer, but nonetheless must match the path taken to 
s this point in the tree. It is more efficient and simpler to 
store already-decoded index bytes this way then to 
somehow optimize to storing only required narrow- 
pointer bytes. 

[0041] Rich pointers also allow computational effi- 
10 ciencies. In particular, when a single immediate index is 
stored in a rich pointer, there is room (e.g. , in the Decode 
field) to store all but the first byte of the index, not just 
the remaining undecoded bytes. This allows faster 
traversal and modification. Like decode bytes, these re- 
f 5 dundant bytes must agree with the path traversed to the 
immediate index. 

[0042] Rich pointers also support pointer portability. 
That is, when a narrow-expanse pointer Indicates only 
the level of the subsidiary node, rather than the number 
of levels being skipped, it remains "portable". Like any 
rich pointer that "knows" about the object to which it re- 
fers but not about the object in which it resides, a port- 
able narrow-expanse pointer allows easier branch in- 
sertion and deletion when an "outlier" index is inserted 
or deleted. (An outlier is an index that belongs under the 
full subexpanse of the slot occupied by the narrow-ex- 
panse pointer, but not under the present narrow ex- 
panse of that pointer.) 

[0043] FIGURE 8 is a diagram of a computer system 
capable of supporting and running a memory storage 
program implementing and maintaining a data structure 
as taught herein. Thus, although the structure is adapt- 
able to a wide range of data structures, programming 
languages, operating systems and hardware platforms 
and systems, FIGURE 8 illustrates one such computer 
system 800 comprising a platform suitable to support 
the structure. Computer system BOO includes Central 
Processing Unit (CPU) 801 coupled to system bus 802. 
CPU 801 may be any general purpose CPU, such as an 
HP PA-8500 or Intel Pentium processor. However, the 
system is not restricted by the architecture of CPU 801 
as long as CPU 801 supports the operations as de- 
scribed herein, e.g. , the use of pointers. System bus 802 
is coupled to Random Access Memory (RAM) 803, 
which may be SRAM, DRAM or SDRAM. ROM 804 Is 
also coupled to system bus 802, which may be PROM, 
EPROM, or EEPROM. RAM 803 and ROM 804 hold us- 
er and system data and programs as is well known in 
the art. 

[0044] System bus 802 is also coupled to Input/output 
(I/O) controller card 805, communications adapter card 
81 1 , user interface card 808, and display card 809. The 
I/O card 805 connects to storage devices 806, such 86 
one or more of a hard drive, a CD drive, a floppy disk 
drive, a tape drive, to the computer system. Communi- 
cations card 81 1 is adapted to couple computer system 
800 to network 812, which may be one or more of a tel- 
ephone network, a Local (LAN) and/or a Wide-Area 
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(WAN) network, an Ethernet network, and/or the Inter- 
net network and can be wire line or wireless. User inter- 
face card 808 couples user input devices, such as key- 
board 813 and pointing device 807, to computer system 
800. Display card 809 is driven by CPU 801 to control 
display device 810. 

[0045] The disclosures in United States patent appli- 
cation IMo.09/874,788, from which this application 
claims priority : and in the abstract accompanying this 
application are incorporated herein by reference. 



Claims 

1 . A data structure for storage of indexes in a compu- 
ter memory, including: 

a hierarchy of branch nodes (103, 108, 109, 
119, 120, 121, 123) ordered into a plurality of 
levels beginning with a top level branch (1 03), 
each of said branch nodes including an array 
of adaptable objects (104, 105, 110, 112, 111, 
114-118) each associated with a subexpanse 
of said indexes mapped by a respective one of 
said branch nodes, said adaptable objects 
each including a type field (T) indicating a type 
of said adaptable object, said type including a 
pointer type in which said adaptable object is 
configured to include a pointer (1 1 6B) to anoth- 
er node and an information data field (116A) 
configured to store information about said other 
node, and an immediate type in which at least 
one of said indexes is stored in said adaptable 
object. 

2. A structure according to claim 1 , wherein said Infor- 
mation data field (1 1 6 A) represents an Index portion 
common to subsidiary ones of said indexes such 
that a subsidiary node (123) is more than one level 
lower in the data structure than its parent (1 09) and 
does not encode a common portion of said subsid- 
iary indexes. 

3. A structure according to claim 1 or 2, wherein a 
pointer field (116B) associated with one of said 
adaptable objects (116) of said pointer type at one 
level of said data structure is directed to another of 
said branch nodes (1 23) at another level of said da- 
ta structure that is removed from said one level by 
at least two levels and said information data field of 
said adaptable object includes a portion of a plural- 
ity of said indexes common to all of said indexes in 
a subexpanse associated with said adaptable ob- 
ject. 

4. A structure according to claim 1 , 2 or 3, Including a 
plurality of leaf nodes (701) associated with one or 
more of the indexes, wherein a pointer field associ- 



ated with one of said adaptable objects of said 
pointer type at one level of said data structure is di- 
rected to one of said leaf nodes at another level of 
said data structure that is removed from said one 
5 level by at least two levels and said information data 
field of said adaptable object includes a portion of 
a plurality of said indexes common to all of said in- 
dexes in a subexpanse residing in said one leaf 
node. 

10 

5. A structure according to any preceding claim, 
wherein at least a part of a aid adaptable object in- 
cludes an immediate index data field (702) config- 
ured to represent at least a portion of at least one 
*5 subsidiary index, such that said subsidiary index is 
immediately present without further indirection 
through a pointer to a different location In said com- 
puter memory. 

20 6. A method of storing indexes in a data structure, in- 
cluding the steps of: 

defining a data structure including a hierarchy 
of branch nodes (103, 108 : 109, 119, 120, 121 , 

25 1 23) ordered into a plurality of levels beginning 

with a top level branch (103), each of said 
branch nodes including an array of adaptable 
objects (104, 105, 110. 112, 111, 114-118) each 
associated with a subexpanse of said indexes 

30 mapped by a respective one of said branch 

nodes, said adaptable objects each including a 
type field (T) Indicating a type of said adaptable 
object, said type including a pointer type (11 6B) 
in which said adaptable object is configured to 

35 include a pointer to another node and an Infor- 

mation data field (1 16A) configured to store in- 
formation about said other node, and an Imme- 
diate type in which at least one of said indexes 
is stored in said adaptable object; and storing 

40 the indexes in the data structure. 

7. A method according to claim 6, including the step 
of defining said data structure to include a plurality 
of leaf nodes (701) associated with one or more of 

45 the indexes, wherein a pointer field (1 04) associat- 
ed with one of said adaptable objects of said pointer 
type at one level of said data structure is directed 
to one of said leaf nodes (701 ) at another level of 
said data structure that is removed from said one 

50 level by at least two levels and said information data 
field of 3a id adaptable object includes a portion of 
a plurality of said indexes common to all of said In- 
dexes in a subexpanse residing in said one leaf 
node. 

55 

8. A method as in claim 6 or 7, including the step of 
configuring at least a part of a said adaptable object 
to include an immediate index data field (702) rep- 
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resenting at least a portion of at least one subsidiary 
index, such that said subsidiary index is immediate- 
ly present without further indirection through a 
pointerto a different location in said computer mem- 
ory. 5 

9. A method as in claim 6, 7 or 8, including the steps 
of storing error detection data (703) in said adapta- 
ble objects and detecting an error condition within 
said data structure using said error detection data. 10 

10. A computer memory (803) for storing data for ac- 
cess by an application program executed on a data 
processing system, including: 

15 

a data structure stored in said memory for stor- 
age of indexes, said data structure Including a 
hierarchy of branch nodes (103, 108, 109, 119, 
1 20, 1 21 , 1 23) ordered into a plurality of levels 
beginning with a top level branch (103), each 20 
of said branch nodes including an array of 
adaptable objects (104, 105, 110, 112, 111, 
114-118) each associated with a subexpanse 
of said indexes mapped by a respective one of 
said branch nodes, said adaptable objects 25 
each including a type field (T) indicating a type 
of said adaptable object, said type including a 
pointer type (116B) in which said adaptable ob- 
ject is configured to include a pointer to another 
node and an information data field (11 6A) con- 30 
figured to store information about said other 
node, and an immediate type in which at least 
one of said indexes is stored in said adaptable 
object. 
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